abstract policy
Robby is Not a Robber (anymore): On the Use of Institutions for Learning Normative Behavior
Tomic, Stevan, Pecora, Federico, Saffiotti, Alessandro
We show how norms can be used to guide a reinforcement learning agent towards achieving normative behavior and apply the same set of norms over different domains. Thus, we are able to: (1) provide a way to intuitively encode social knowledge (through norms); (2) guide learning towards normative behaviors (through an automatic norm reward system); and (3) achieve a transfer of learning by abstracting policies; Finally, (4) the method is not dependent on a particular RL algorithm. We show how our approach can be seen as a means to achieve abstract representation and learn procedural knowledge based on the declarative semantics of norms and discuss possible implications of this in some areas of cognitive science. Index T erms --Norms, Institutions, Automatic Reward Shaping, Transfer of Learning, Abstract Policies, Abstraction, State-Space Selection, Schema I. I NTRODUCTION In order to be accepted in human society, robots need to comply with human social norms.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > Sweden > Örebro County > Örebro (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Tractability of Planning with Loops
Srivastava, Siddharth (University of California, Berkeley) | Zilberstein, Shlomo (University of Massachusetts Amherst) | Gupta, Abhishek (University of California, Berkeley) | Abbeel, Pieter (University of California, Berkeley) | Russell, Stuart (University of California, Berkeley)
We create a unified framework for analyzing and synthesizing plans with loops for solving problems with non-deterministic numeric effects and a limited form of partial observability. Three different action models---with deterministic, qualitative non-deterministic and Boolean non-deterministic semantics---are handled using a single abstract representation. We establish the conditions under which the correctness and termination of solutions, represented as abstract policies, can be verified. We also examine the feasibility of learning abstract policies from examples. We demonstrate our techniques on several planning problems and show that they apply to challenging real-world tasks such as doing the laundry with a PR2 robot. These results resolve a number of open questions about planning with loops and facilitate the development of new algorithms and applications.
- North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
- North America > United States > California > Alameda County > Berkeley (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Comparative Analysis of Abstract Policies to Transfer Learning in Robotics Navigation
Freire, Valdinei (Universidade de São Paulo) | Costa, Anna Helena Reali (Universidade de São Paulo)
Reinforcement learning enables a robot to learn behavior through trial-and-error. However, knowledge is usually built from scratch and learning may take a long time. Many approaches have been proposed to transfer the knowledge learned in one task and reuse it in another new similar task to speed up learning in the target task.A very effective knowledge to be transferred is an abstract policy, which generalizes the learned policies in source tasks to extend the domain of tasks that can reuse them.There are inductive and deductive methods to generate abstract policies.However, there is a lack of deeper analysis to assess not only the effectiveness of each type of policy, but also the way in which each policy is used to accelerate the learning in a new task.In this paper we propose two simple inductive methods and we use a deductive method to generate stochastic abstract policies from source tasks. We also propose two strategies to use the abstract policy during learning in a new task: the hard and the soft strategy. We make a comparative analysis between the three types of policies and the two strategies of use in a robotic navigation domain.We show that these techniques are effective in improving the agent learning performance, especially during the early stages of the learning process, when the agent is completely unaware of the new task.
- South America > Brazil > São Paulo (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Seeing the Forest Despite the Trees: Large Scale Spatial-Temporal Decision Making
Crowley, Mark, Nelson, John, Poole, David L
We introduce a challenging real-world planning problem where actions must be taken at each location in a spatial area at each point in time. We use forestry planning as the motivating application. In Large Scale Spatial-Temporal (LSST) planning problems, the state and action spaces are defined as the cross-products of many local state and action spaces spread over a large spatial area such as a city or forest. These problems possess state uncertainty, have complex utility functions involving spatial constraints and we generally must rely on simulations rather than an explicit transition model. We define LSST problems as reinforcement learning problems and present a solution using policy gradients. We compare two different policy formulations: an explicit policy that identifies each location in space and the action to take there; and an abstract policy that defines the proportion of actions to take across all locations in space. We show that the abstract policy is more robust and achieves higher rewards with far fewer parameters than the elementary policy. This abstract policy is also a better fit to the properties that practitioners in LSST problem domains require for such methods to be widely useful.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada > British Columbia > Vancouver Island > Capital Regional District > Victoria (0.04)
Simultaneous Abstract and Concrete Reinforcement Learning
Matos, Tiago (Universidade de Sao Paulo) | Bergamo, Yannick P. (Universidade de Sao Paulo) | Silva, Valdinei Freire da (Universidade de Sao Paulo) | Cozman, Fabio G. (Universidade de Sao Paulo) | Costa, Anna Helena Reali (Universidade de Sao Paulo)
Suppose an agent builds a policy that satisfactorily solves a decision problem; suppose further that some aspects of this policy are abstracted and used as starting point in a new, different decision problem. How can the agent accrue the benefits of the abstract policy in the new concrete problem? In this paper we propose a framework for simultaneous reinforcement learning, where the abstract policy helps start up the policy for the concrete problem, and both policies are refined through exploration. We report experiments that demonstrate that our framework is effective in speeding up policy construction for practical problems.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- South America > Brazil > São Paulo (0.04)
- North America > United States > New York (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
Generalizing and Categorizing Skills in Reinforcement Learning Agents Using Partial Policy Homomorphisms
Rajendran, Srividhya (The University of Texas at Arlington) | Huber, Manfred (The University of Texas at Arlington)
A reinforcement learning agent involved in life-long learning in a complex and dynamic environment has to have the ability to utilize control knowledge acquired in one situation in novel contexts. As part of this, it is important for the learning agent not only to be able to learn a new skill for a specific instance of a task but also to identify similar tasks, form a reusable skill and representational abstractions for the corresponding ''task type'', and to apply these abstractions in new, previously unseen contexts. This paper presents a new approach to policy generalization that derives an abstract policy for a set of similar tasks (a ''task type'') by constructing a partial policy homomorphism from a set of basic policies learned for previously seen task instances. The resulting generalized policy can then be applied in new contexts to address new instances of similar tasks. As opposed to many recent approaches in lifelong learning systems, this approach allows to identify similar tasks based on the functional characteristics of the corresponding skills and provides a means of transferring the learned knowledge to new situations without the need for complete knowledge of the state space and the system dynamics in the new environment. To illustrate the new policy generalization method and to demonstrate its ability to reuse the gained knowledge in new contexts, it is applied to a set of grid world examples.